feat(storage): shard repo-axis GSI1 partition (#23)#34
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements GSI1 sharding for the REPO# partition to mitigate hot partition risks under Issue #23. It introduces a deterministic 16-shard key generator, a scatter-gather reader to merge sharded results, and an in-place backfill migration utility, while updating write mappers and repository stores to support the sharded schema. The review feedback highlights two key improvement opportunities: first, a potential NameError on Python versions prior to 3.11 in transport.py due to referencing DynamoDbCompatibleConfig inside its own class without future annotations; second, a performance issue in repo_axis_migration.py where redundant full table scans are performed to compute remaining legacy items, which can be optimized by setting remaining = failed.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
|
||
| @classmethod | ||
| def from_env(cls) -> "DynamoDbCompatibleConfig": | ||
| def from_env(cls) -> DynamoDbCompatibleConfig: |
There was a problem hiding this comment.
Since from __future__ import annotations is not imported at the top of this file, referencing DynamoDbCompatibleConfig as a return type annotation inside its own class definition will raise a NameError at import time on Python versions prior to 3.11.
To prevent runtime import errors on older Python versions, please revert this annotation to a string literal "DynamoDbCompatibleConfig", or add from __future__ import annotations at the very top of the file.
| def from_env(cls) -> DynamoDbCompatibleConfig: | |
| def from_env(cls) -> "DynamoDbCompatibleConfig": |
| skipped += 1 | ||
| else: | ||
| failed += 1 | ||
| remaining = len(_scan_legacy(table, entity_type)) |
There was a problem hiding this comment.
In DynamoDB, a FilterExpression is applied after the scan reads the data from disk but before returning it to the client. This means a full table scan always consumes the same amount of Read Capacity Units (RCUs) regardless of the filter.
Calling _scan_legacy(table, entity_type) again here to compute remaining triggers an additional full table scan for each entity type (6 extra scans in total). Since any item that was successfully backfilled or skipped is no longer legacy, the number of remaining legacy items is exactly equal to the number of failed items. You can directly assign remaining = failed to completely eliminate these 6 expensive scans.
| remaining = len(_scan_legacy(table, entity_type)) | |
| remaining = failed |
Spread per-repo rows across REPO#<repo>#SHARD#00..15 (repoAxisVersion=2, shardCount=16) to remove the single hot REPO#<repo> GSI1 partition. - RepoAxisKey + repo_axis_inputs: single source of truth for the per-entity sharded key formula; write mappers and backfill share it (no drift). - read_repo_axis: scatter-gather fan-out with migration-only include_legacy fallback, fail-closed on shard error, dedupe by (PK,SK) preferring higher repoAxisVersion, canonical (gsi1sk,PK,SK) order. - read_observations_for_repo / residual_for_repo route through the fan-out reader; direct primary-key reads unchanged. - repo_axis_migration: in-place conditional backfill (not row copy) with a per-entity inventory/backfilled/skipped/failed/remaining report and a documented compatibility-removal gate. - regression scan: single allowed construction site; bans any #SHARD# literal, raw gsi1pk=REPO#, and GSI1 reads on the raw REPO# partition. Implements docs/workbench/specs/issue-23-repo-gsi1-sharding/design.md (M1-M5) and the post-implementation multi-agent review fixes. Full suite green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
7ec55e7 to
de4facc
Compare
What
Shard the per-repo
REPO#<repo>GSI1 partition intoREPO#<repo>#SHARD#00..15(repoAxisVersion=2,repoAxisShardCount=16) soper-repo entity rows no longer collide on a single hot partition, while
repo-local reads stay logically identical.
Why
The single
REPO#<repo>GSI1 partition is a hot-partition risk at cloud scale(#23): every per-repo entity (
FINDING,FINDING_OBSERVATION,FINDING_STATE,STATE_EVENT,SCAN_LEDGER,GHAS_ALERT) writes to one partition key per repo.Changes
repo_axis.py:RepoAxisKey+repo_axis_inputs— single source of truth forthe per-entity sharded key formula, shared by write mappers and backfill so the
two cannot drift. Sole owner of the
REPO#/#SHARD#literals.items.py: all 6 repo-axis mappers build the base item, then merge the shardedGSI1 projection derived from the item's own fields.
STATE_EVENTgsi1skgainsan
eventSeqtie-breaker suffix; the rule-axis GSI2 is unchanged.repo_axis_reader.py:read_repo_axisscatter-gather fan-out across all shardswith a migration-only
include_legacyfallback; fails closed on a shard error,dedupes by
(PK, SK)preferring the higherrepoAxisVersion, returns canonical(gsi1sk, PK, SK)order.store.py/branch_residual.py:read_observations_for_repoandresidual_for_reporoute through the fan-out reader (include_legacythreaded,default
False). Direct primary-key reads (read_finding_state,has_scan_ledger,read_finding_state_events) are unchanged.repo_axis_migration.py: in-place conditional backfill (not a row copy) with aper-entity inventory / backfilled / skipped / failed / remaining report and a
documented compatibility-removal gate.
tests/test_repo_axis_sharding.py(helper, fan-out reader,migration, regression scan); updated store and CLI residual tests. The
regression scan bans any
#SHARD#literal, rawgsi1pk=REPO#, and GSI1 readson the raw
REPO#partition outside the single helper module.Implements
docs/workbench/specs/issue-23-repo-gsi1-sharding/design.md(M1–M5)plus the post-implementation multi-agent review fixes.
Test
uv run pytest— full suite green (616 passed)uv run ruff check/uv run ruff format— cleanChecklist
Closes #23